{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "b18c5412",
   "metadata": {},
   "source": [
    "// Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.\n",
    "// SPDX-License-Identifier: MIT-0"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d61daf89-0485-47bf-aa96-280577bbadaf",
   "metadata": {},
   "source": [
    "# Inference with Customized Amazon Nova Models \n",
    "\n",
    "This notebook walk-through how to conduct inference on fine-tuned Amazon Nova models. We first demonstrate a single example followed by example scripts for running batch inference."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3cd5400c",
   "metadata": {},
   "source": [
    "# Prerequisites\n",
    "\n",
    "- Make sure you have executed 01_Amazon_Nova_Finetuning_Walkthrough.ipynb notebook.\n",
    "- Make sure you are using the same kernel and instance as 01_Amazon_Nova_Finetuning_Walkthrough.ipynb notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0147f270-3cf7-4a94-b806-9fc3aabda552",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install -qU -r requirements.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "ef96a213-b08e-41ec-a987-c1fdd4e3aa0b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<script>Jupyter.notebook.kernel.restart()</script>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# restart kernel for packages to take effect\n",
    "from IPython.core.display import HTML\n",
    "HTML(\"<script>Jupyter.notebook.kernel.restart()</script>\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dae87342-ea92-4b49-a705-184027a76356",
   "metadata": {},
   "source": [
    "# Setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "id": "2e1cbb30-eedc-4079-a8b3-5eb21dd88e53",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import boto3 \n",
    "from botocore.config import Config\n",
    "import sys\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import json\n",
    "import time \n",
    "import concurrent.futures\n",
    "import shortuuid\n",
    "import tqdm\n",
    "import os"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "id": "742d8e7f-bb20-4283-9b81-fd67fb15c88d",
   "metadata": {},
   "outputs": [],
   "source": [
    "my_config = Config(\n",
    "    region_name = 'us-east-1', \n",
    "    signature_version = 'v4',\n",
    "    retries = {\n",
    "        'max_attempts': 5,\n",
    "        'mode': 'standard'\n",
    "    })\n",
    "\n",
    "bedrock = boto3.client(service_name=\"bedrock\", config=my_config)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4d10ca06-93a4-4552-aec4-6684ff9d2464",
   "metadata": {},
   "source": [
    "# Construct model input "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a8bb3880",
   "metadata": {},
   "source": [
    "Before invoking the customized models, we need to construct model input following the format needed by Amazon Nova models."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "id": "abc59170-bc16-466b-a78e-92e13972108e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# API setting constants\n",
    "API_MAX_RETRY = 16\n",
    "API_RETRY_SLEEP = 10\n",
    "API_ERROR_OUTPUT = \"$ERROR$\"\n",
    "\n",
    "\n",
    "def create_nova_messages(prompt):\n",
    "    \"\"\"\n",
    "    Create messages array for Amazon Nova models from conversation\n",
    "\n",
    "    Args:\n",
    "    conv (object): Conversation object containing messages\n",
    "\n",
    "    Returns:\n",
    "    list: List of formatted messages for Amazon Nova model\n",
    "    \"\"\"\n",
    "    messages = []\n",
    "    \n",
    "    messages.append({\n",
    "            \"role\": \"user\",\n",
    "            \"content\": [{\"text\": prompt}]\n",
    "        })\n",
    "\n",
    "    return messages\n",
    "\n",
    "def chat_completion_aws_bedrock_nova(model, conv, temperature, max_tokens, aws_region=\"us-east-1\"):\n",
    "    \"\"\"\n",
    "    Call AWS Bedrock API for chat completion using Amazon Nova models\n",
    "\n",
    "    Args:\n",
    "    model (str): Model ID\n",
    "    conv (object): Conversation object containing messages\n",
    "    temperature (float): Temperature parameter for response generation\n",
    "    max_tokens (int): Maximum tokens in response\n",
    "    api_dict (dict, optional): API configuration dictionary\n",
    "    aws_region (str, optional): AWS region, defaults to \"us-west-2\"\n",
    "\n",
    "    Returns:\n",
    "    str: Generated response text or error message\n",
    "    \"\"\"\n",
    "\n",
    "    # Configure AWS client \n",
    "    bedrock_rt_client = boto3.client(\n",
    "            service_name='bedrock-runtime',\n",
    "            region_name=aws_region,\n",
    "        )\n",
    "\n",
    "    \n",
    "    # Retry logic for API calls\n",
    "    for _ in range(API_MAX_RETRY):\n",
    "        try:\n",
    "            # Create messages from conversation\n",
    "            messages = create_nova_messages(conv)\n",
    "            inferenceConfig = {\n",
    "                \"max_new_tokens\": max_tokens,\n",
    "                \"temperature\": temperature, \n",
    "            }\n",
    "\n",
    "            # Prepare request body\n",
    "            model_kwargs = {\"messages\": messages,\n",
    "                            \"inferenceConfig\": inferenceConfig}\n",
    "            body = json.dumps(model_kwargs)\n",
    "\n",
    "            # Call Bedrock API\n",
    "            response = bedrock_rt_client.invoke_model(\n",
    "                body=body,\n",
    "                modelId=model,\n",
    "                accept='application/json',\n",
    "                contentType='application/json'\n",
    "            )\n",
    "\n",
    "            # Parse response\n",
    "            response_body = json.loads(response.get('body').read())\n",
    "            \n",
    "            output = response_body['output']['message']['content'][0]['text']\n",
    "            break\n",
    "\n",
    "        except Exception as e:\n",
    "            print(type(e), e)\n",
    "            ## Uncomment time.sleep if encounter Bedrock invoke throttling error\n",
    "            # time.sleep(API_RETRY_SLEEP)\n",
    "\n",
    "    return output"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7342ceae-3f3d-4c4c-b2cb-967a2df68483",
   "metadata": {},
   "source": [
    "# Inference on customized Amazon Nova model (individual example)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "id": "485a3c00",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "When anonymous operational metrics are enabled for an Amazon Elastic File System (EFS) file system, AWS collects and sends specific details to Amazon CloudWatch. These metrics are designed to help you monitor the performance and usage of your EFS file systems without requiring you to provide personally identifiable information. Here are the specific details that are collected and sent:\n",
      "\n",
      "1. **File System-Level Metrics**:\n",
      "    - **FileSystemSize**: The total size of the file system in bytes.\n",
      "    - **FreeStorageCapacity**: The amount of available storage capacity in bytes.\n",
      "    - **BurstingCredits**: The number of bursting credits available for the file system.\n",
      "    - **BurstBalance**: The current balance of bursting credits.\n",
      "    - **ThroughputMode**: The throughput mode of the file system (e.g., Bursting, Provisioned).\n",
      "\n",
      "2. **Network Metrics**:\n",
      "    - **NetworkThroughput**: The amount of network throughput in bytes per second.\n",
      "    - **NetworkLatency**: The average latency of network requests in milliseconds.\n",
      "\n",
      "3. **I/O Metrics**:\n",
      "    - **ReadOperations**: The number of read operations.\n",
      "    - **WriteOperations**: The number of write operations.\n",
      "    - **TotalIOPS**: The total input/output operations per second.\n",
      "    - **ReadIOPS**: The number of read IOPS.\n",
      "    - **WriteIOPS**: The number of write IOPS.\n",
      "\n",
      "4. **Mount Target Metrics**:\n",
      "    - **NetworkThroughputPerMountTarget**: The network throughput per mount target in bytes per second.\n",
      "    - **NetworkLatencyPerMountTarget**: The average network latency per mount target in milliseconds.\n",
      "\n",
      "5. **Additional Metrics**:\n",
      "    - **MountTargetCount**: The number of mount targets associated with the file system.\n",
      "    - **BackupSize**: The size of the backup in bytes (if backups are enabled).\n",
      "\n",
      "These metrics are sent to CloudWatch in 1-minute intervals, allowing you to monitor and analyze the performance and usage of your EFS file systems over time. The data is anonymized and aggregated to ensure that no personally identifiable information is shared.\n",
      "\n",
      "To enable these metrics, you can configure your EFS file system to send anonymous operational data to AWS through the AWS Management Console, AWS CLI, or AWS SDKs. Once enabled, AWS will start collecting and sending these metrics to CloudWatch automatically.\n"
     ]
    }
   ],
   "source": [
    "# [Important!] Update `base_model_id` to `provisioned_model_id` based on the previous jupyter notebook\n",
    "base_model_id = 'amazon.nova-lite-v1:0'\n",
    "temperature = 0.2\n",
    "max_tokens = 1024\n",
    "\n",
    "ques = \"What specific details are collected and sent to AWS when anonymous operational metrics are enabled for an Amazon EFS file system?\"\n",
    "\n",
    "print(chat_completion_aws_bedrock_nova(base_model_id, ques, temperature+0.01, max_tokens, aws_region=\"us-east-1\"))      "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "731bbeea",
   "metadata": {},
   "source": [
    "# Batch inference with customized Amazon Nova model "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "21c027cf",
   "metadata": {},
   "source": [
    "In this section, we provide code snippets for efficiently running batch inference using the same `chat_completion_aws_bedrock_nova` function as above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "id": "f0c444dd",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[' \"What specific details are collected and sent to AWS when anonymous operational metrics are enabled for an Amazon EFS file system?', \"What's required for a successful AWS CloudFormation launch?\"]\n"
     ]
    }
   ],
   "source": [
    "# Load test cases \n",
    "question_file = f\"dataset/test_set/question_short.jsonl\"\n",
    "\n",
    "questions = []\n",
    "with open(question_file, \"r\", encoding=\"utf-8\") as ques_file:\n",
    "    for line in ques_file:\n",
    "        if line:\n",
    "            questions.append(json.loads(line))\n",
    "\n",
    "print(questions[0][\"turns\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "id": "65ff0707",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Helper function that helps organize answers from customized Amazon Nova model\n",
    "\n",
    "def get_answer(\n",
    "    question: dict, model_id: str, num_choices: int, max_tokens: int, temperature: float, answer_file: str\n",
    "):\n",
    "\n",
    "    choices = []\n",
    "\n",
    "    for i in range(num_choices):\n",
    "        conv = \"\"\n",
    "        turns = []\n",
    "        \n",
    "        for j in range(len(question[\"turns\"])):\n",
    "            conv += question[\"turns\"][j]\n",
    "            output = chat_completion_aws_bedrock_nova(model_id, conv, temperature+0.01, max_tokens, aws_region=\"us-east-1\")        \n",
    "            turns.append(output)\n",
    "\n",
    "        choices.append({\"index\": i, \"turns\": turns})\n",
    "\n",
    "    # Dump answers\n",
    "    ans = {\n",
    "        \"question_id\": question[\"question_id\"],\n",
    "        \"answer_id\": shortuuid.uuid(),\n",
    "        \"model_id\": model,\n",
    "        'use_rag': False,\n",
    "        \"choices\": choices,\n",
    "        \"tstamp\": time.time(),\n",
    "    }\n",
    "\n",
    "    os.makedirs(os.path.dirname(answer_file), exist_ok=True)\n",
    "    with open(answer_file, \"a\", encoding=\"utf-8\") as f:\n",
    "        f.write(json.dumps(ans) + \"\\n\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "id": "d97c7f3a-8640-4aa1-b11a-73b8053101e9",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Output to dataset/model_answer/amazon.nova-lite-v1:0_V2.jsonl\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 10/10 [01:19<00:00,  8.00s/it]\n"
     ]
    }
   ],
   "source": [
    "# Run batch inference and save model output\n",
    "\n",
    "## [Important!] Update `base_model_id` to `provisioned_model_id` based on the previous jupyter notebook\n",
    "model_id = 'amazon.nova-lite-v1:0'\n",
    "num_choices = 1 \n",
    "max_tokens = 1024\n",
    "temperature = 0.2\n",
    "    \n",
    "answer_file = f\"dataset/model_answer/{model_id}_V2.jsonl\"\n",
    "print(f\"Output to {answer_file}\")\n",
    "\n",
    "with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:\n",
    "    futures = []\n",
    "    for question in questions:\n",
    "        future = executor.submit(\n",
    "            get_answer,\n",
    "            question,\n",
    "            model_id,\n",
    "            num_choices,\n",
    "            max_tokens,\n",
    "            temperature,\n",
    "            answer_file,\n",
    "        )\n",
    "        futures.append(future)\n",
    "\n",
    "    for future in tqdm.tqdm(\n",
    "        concurrent.futures.as_completed(futures), total=len(futures)\n",
    "    ):\n",
    "        future.result()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ed39ae22-f78c-4cf6-8746-cc12aea739e9",
   "metadata": {},
   "source": [
    "# [Optional] Plot training loss"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "20cb7c64",
   "metadata": {},
   "source": [
    "Optionally, you can also plot training loss using the `step_wise_training_metrics.csv` file generated from the finetuning job. This csv file and other model artifacts can be found under Amazon Bedrock -> Custom model -> Custom model name -> Output data (S3 location)  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "54362a24-9222-412f-b89e-60913fbb52dd",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "def plot_training_loss(input_file, output_file):\n",
    "    ''' This function plots training loss using the default model output file 'step_wise_training_metrics.csv' generated from the finetuning job'''\n",
    "    \n",
    "    # Read the CSV file\n",
    "    df = pd.read_csv(input_file)\n",
    "    \n",
    "    # Create the plot\n",
    "    plt.figure(figsize=(10, 6))\n",
    "    plt.plot(df['step_number'], df['training_loss'], 'b-', linewidth=2)\n",
    "    \n",
    "    # Customize the plot\n",
    "    plt.title('Training Loss vs Step Number', fontsize=14)\n",
    "    plt.xlabel('Step Number', fontsize=12)\n",
    "    plt.ylabel('Training Loss', fontsize=12)\n",
    "    plt.grid(True, linestyle='--', alpha=0.7)\n",
    "    \n",
    "    # Add some padding to the axes\n",
    "    plt.margins(x=0.02)\n",
    "    \n",
    "    # Save the plot\n",
    "    plt.savefig(output_file, dpi=300, bbox_inches='tight')\n",
    "    plt.close()\n",
    "    \n",
    "    print(f\"Plot saved as {output_file}\")\n",
    "\n",
    "\n",
    "# Example usage\n",
    "\n",
    "plot_training_loss(input_file = 'model_training_loss/aws-ft-nova-lite/step_wise_training_metrics_epoch5_lr_1e-06.csv', \n",
    "                   output_file = 'model_training_loss/aws-ft-nova-lite/training_loss_epoch5_lr_1e-06.png')\n",
    "\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "f1aa38b9",
   "metadata": {},
   "source": [
    "# Conclusion\n",
    "\n",
    "In this and last notebook, we provided a detailed walkthrough on how to fine-tune, host, and conduct inference with customized Amazon Nova through the Amazon Bedrock API. Please refer to the [guidelines](https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-guidelines.html) for more tips on fine-tuning Amazon Nova models to meet your need."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d6a8606f-049a-4c7b-939c-c5c44bfda210",
   "metadata": {},
   "source": [
    "# Delete provisioned throughput"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2d98273e",
   "metadata": {},
   "source": [
    "<b>Warning</b>: Please make sure to delete providsioned throughput as there will cost incurred if its left in running state, even if you are not using it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4341f7e8-e85b-4c88-9f77-11b2afc463de",
   "metadata": {},
   "outputs": [],
   "source": [
    "bedrock.delete_provisioned_model_throughput(provisionedModelId=provisioned_model_id)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_pytorch_p310",
   "language": "python",
   "name": "conda_pytorch_p310"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.14"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
